External validation of cardiac arrest-specific prognostication scores developed for early prognosis estimation after out-of-hospital cardiac arrest in a Korean multicenter cohort

We evaluated the performance of cardiac arrest-specific prognostication scores developed for outcome prediction in the early hours after out-of-hospital cardiac arrest (OHCA) in predicting long-term outcomes using independent data. The following scores were calculated for 1,163 OHCA patients who were treated with targeted temperature management (TTM) at 21 hospitals in South Korea: OHCA, cardiac arrest hospital prognosis (CAHP), C-GRApH (named on the basis of its variables), TTM risk, 5-R, NULL-PLEASE (named on the basis of its variables), Serbian quality of life long-term (SR-QOLl), cardiac arrest survival, revised post-cardiac arrest syndrome for therapeutic hypothermia (rCAST), Polish hypothermia registry (PHR) risk, and PROgnostication using LOGistic regression model for Unselected adult cardiac arrest patients in the Early stages (PROLOGUE) scores and prediction score by Aschauer et al. Their accuracies in predicting poor outcome at 6 months after OHCA were determined using the area under the receiver operating characteristic curve (AUC) and calibration belt. In the complete-case analyses, the PROLOGUE score showed the highest AUC (0.923; 95% confidence interval [CI], 0.904–0.941), whereas the SR-QOLl score had the lowest AUC (0.749; 95% CI, 0.711–0.786). The discrimination performances were similar in the analyses after multiple imputation. The PROLOGUE, TTM risk, CAHP, NULL-PLEASE, 5-R, and cardiac arrest survival scores were well calibrated. The rCAST and PHR risk scores showed acceptable overall calibration, although they showed miscalibration under the 80% CI level at extreme prediction values. The OHCA score, C-GRApH score, prediction score by Aschauer et al., and SR-QOLl score showed significant miscalibration in both complete-case (P = 0.026, 0.013, 0.005, and < 0.001, respectively) and multiple-imputation analyses (P = 0.007, 0.018, < 0.001, and < 0.001, respectively). In conclusion, the discrimination performances of the prognostication scores were all acceptable, but some showed significant miscalibration.

Introduction Out-of-hospital cardiac arrest (OHCA) remains the leading cause of mortality and disability worldwide [1,2]. Most of the patients resuscitated from OHCA eventually die in hospital or develop severe neurologic sequelae; only 10%-30% survive with good neurologic outcome [2,3]. Current guidelines recommend delaying neurologic prognosis estimation in comatose cardiac arrest patients until at least 72 h after return of spontaneous circulation (ROSC) [4]. However, there is a need for an accurate prognostic tool useful during the early hours after OHCA. In the case of comatose OHCA patients, families desire precise information on the neurologic prognoses as early as possible. Treating physicians often have to make critical decisions regarding the use of costly and resource-intensive therapies, such as extracorporeal membrane oxygenation (ECMO), in the early stages of post-cardiac arrest care, when the patients' neurologic prognoses are uncertain.
Several cardiac arrest-specific prognostication scores for use in the early hours after OHCA have been developed from retrospective or prospective analyses of OHCA data [5][6][7][8][9][10][11][12][13][14][15][16]. These scores have several limitations that must be addressed to render them useful in clinical practice. A risk prediction score derived from one study population may not be accurate in other populations. Thus, external validations in various patient populations are required to enable widespread reliance on a risk prediction score, but few such scores have undergone any external validation using independent data; where this has been done, it was usually limited to retrospective analyses of discrimination performance [7,9,[17][18][19][20][21][22]. Most of the scores are intended to predict short-term outcomes, such as survival to hospital discharge or neurologic outcome at hospital discharge [5-8, 10-12, 14-16], and have not been evaluated as a means to predict long-term outcomes. Targeted temperature management (TTM) is now the standard treatment for comatose OHCA patients. However, several scores were developed before the widespread use of TTM or derived from studies that included OHCA patients irrespective of whether they had undergone TTM [5-7, 10, 12-14].
To address these limitations, we sought to evaluate the performance of cardiac arrest-specific prognostication scores developed for outcome prediction in the early hours after OHCA in predicting long-term outcomes, using independent data from a multicenter registry of comatose OHCA patients who underwent TTM. We hypothesized that the scores would accurately predict long-term outcomes in an independent cohort of OHCA patients who underwent TTM.

Study design and setting
This study conformed to the principles outlined in the Declaration of Helsinki. It was a retrospective analysis of data from the Korean Hypothermia Network prospective (KORHN-pro) registry, which enrolled adult OHCA patients treated with TTM at 22 teaching hospitals in the Republic of Korea [3]. In brief, a principal investigator at each participating hospital reviewed the medical records of patients who were eligible for registry enrollment and collected their demographic, prehospital resuscitation, in-hospital treatment, and outcomes data in an anonymous fashion using a web-based case report form based on the Utstein Resuscitation Registry Templates [23]. Data quality was assured by five clinical research associates who queried any concerns with the investigators, and a data manager with final responsibility for determining data acceptability. The study design and registry protocol were approved by the institutional review board of all participating hospitals, including the Chonnam National University Hospital Institutional Review Board (CNUH-2015-164) and registered at the International Clinical Trials Registry Platform (ClinicalTrials.gov identifier: NCT02827422). Written informed consent was obtained from the legal surrogates of all patients enrolled in the registry.

Study population
The KORHN-pro registry included all adult (� 18 years) unconscious (Glasgow Coma Scale [GCS] score < 8) OHCA survivors treated with TTM at participating hospitals between October 2015 and December 2018, except those with the following conditions: OHCA associated with hemorrhagic or ischemic stroke; poor pre-arrest neurologic status (cerebral performance category [CPC] of 3 or 4); body temperature < 30˚C on admission; pre-arrest do-not-resuscitate order; or known terminal illness leading to life expectancy < 6 months. One of the scores included in this study (PROLOGUE [PROgnostication using LOGistic regression model for Unselected adult cardiac arrest patients in the Early stages]) is developed using data from one of the participating hospitals [7]. Thus, patients enrolled from this center were excluded from this study, as were patients without data on outcomes at 6 months. The patients included in the registry were managed according to the treatment protocols of each hospital.

Variables
Data on the following variables were obtained for each patient: age, sex, hospital, pre-existing chronic diseases (coronary artery disease, heart failure, arrhythmia, cerebrovascular accident [CVA], neurologic disease other than CVA, diabetes, hypertension, pulmonary disease, chronic kidney disease, liver cirrhosis, and malignancy), patient location at the time of cardiac arrest, presence of a witness to the collapse, bystander cardiopulmonary resuscitation (CPR), first monitored rhythm, no-flow duration, low-flow duration, time to ROSC, dose of epinephrine given during CPR, etiology of cardiac arrest, circulatory status on emergency department arrival (prehospital ROSC), GCS motor score and pupillary light reflex obtained before intensive care unit (ICU) admission, initial laboratory parameters after ROSC (lactate, arterial pH, partial pressure of arterial oxygen [PaO 2 ], partial pressure of arterial carbon dioxide [PaCO 2 ], potassium, phosphate, creatinine, glucose, and hemoglobin), duration and target temperature of TTM, Sequential Organ Failure Assessment score on the first day after hospital admission, occurrence of rearrest before ICU admission, critical care interventions implemented during hospitalization (coronary angiography and ECMO), length of hospital stay, and CPC at 6 months after OHCA. No-flow and low-flow durations were defined as the time interval from collapse to first CPR attempt and the time interval from first CPR attempt to ROSC, respectively. Time to ROSC was defined as the time interval from collapse to ROSC. CPC at 6 months after OHCA was evaluated through in-person or telephone interviews conducted by medical staff at each center who were blinded to patient data. A CPC of 1 or 2 was defined as a good outcome and a CPC of 3-5 as a poor one (primary outcome). After literature review, the following cardiac arrest-specific prognostication scores were selected based on availability of the data required for score calculation and were calculated using the formulas presented in the original publications. The scores were as follows: OHCA [5]; cardiac arrest hospital prognosis (CAHP) [6]; PROLOGUE [7]; C-GRApH [8], named on the basis of its variables; TTM risk [9]; prediction score by Aschauer et al. [10]; 5-R [11]; NULL-PLEASE [12], named on the basis of its variables; Serbian quality of life long-term (SR-QOLl) [13]; cardiac arrest survival [14]; revised post-cardiac arrest syndrome for therapeutic hypothermia (rCAST) [15]; and Polish hypothermia registry (PHR) risk [16]. The characteristics of these scores are summarized in Table 1. A greater risk of poor outcome is indicated by lower scores for the 5-R and SR-QOLl scores, but otherwise by higher scores.

Statistical analysis
Data analysis and reporting were performed in accordance with the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis statement [24]. The sample size of this study far exceeded the suggested minimum sample size for external validation studies of multivariable prediction models [25,26]. Statistical analyses were conducted using T&F programme version 3.0 (YooJin BioSoft, Goyang, Republic of Korea) and R language version 4.0.3 (R Foundation for Statistical Computing, Vienna, Austria). Continuous variables are presented by medians with interquartile ranges, unless otherwise specified. Categorical variables are expressed as numbers of cases with percentages. Comparisons between two independent groups were performed using the Mann-Whitney U test for continuous variables and the chi-square test with continuity correction for categorical variables. To determine the association of each prognostication score with the primary outcome, binary logistic regression analyses were performed after dividing the patients into two groups according to the optimal cut-off for each score. The discrimination abilities of the prognostication scores were assessed using receiver operating characteristic (ROC) analysis, and quantified with area under the ROC curve (AUC). The AUC values were compared in a pairwise manner using the method of DeLong et al. [27]. For each score, sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were calculated for the optimal cut-off, determined using the Youden index. The calibration performances of the prognostication scores were assessed using the calibration belt [28,29]. To allow for comparisons between scores, score performances were initially evaluated for patients for whom all 12 score values were calculable. To evaluate the robustness of the results, missing values of the variables required for the calculation of the prognostication scores were imputed using the MICE package in R, and the performances of the prognostication scores were reassessed. Statistical significance was indicated by a two-sided P-value of < 0.05.

Results
A total of 1,373 adult OHCA patients treated with TTM were enrolled in the KORHN-pro registry. Among these, 187 who were enrolled from the hospital involved in the development of PROLOGUE and 23 without data on CPC at 6 months after OHCA were excluded from this study, leaving 1,163 included patients (Fig 1). These were mostly male (70.9%), with a median age of 58.3 years old (46.8-69.9). The majority of the patients had a witnessed cardiac arrest (71.6%), received bystander CPR (63.1%), and presented with a non-shockable initial cardiac arrest rhythm (63.2%). The no-flow duration, low-flow duration, and time to ROSC were 1.0 (0.0-6.0), 25.0 (14.0-38.0), and 30.0 (18.0-43.0) min, respectively. The cardiac arrest was cardiac in origin in 714 (61.4%) patients. Four hundred (34.4%) patients underwent coronary angiography, and 57 (4.9%) received ECMO during hospitalization. Of the included patients, 357 (30.7%) had a good outcome 6 months after OHCA, while the remaining 806 (69.3%) patients had a poor outcome. The clinical and laboratory characteristics of patients, stratified by outcomes at 6 months after OHCA, are summarized in Table 2. As shown in Table 2, all 12 of the prognostication scores in the present study were significantly associated with the primary outcome (all P < 0.001).

Prognostic performances of the scores
There was a total of 804 patients for whom all 12 prognostication scores were calculable, of whom 241 (30.0%) had a good outcome and 563 (70.0%) had a poor outcome. In binary logistic regression analyses examining the association between scores above the optimal cut-off (below the optimal cut-off for the 5-R and SR-QOLl scores) and the risk of poor outcome at 6 months after OHCA for each score (Fig 2), the odds ratios ranged from 6.813 (C-GRApH score) to 32.143 (PROLOGUE). The discrimination abilities of the prognostication scores in these patients are shown in Fig 3 and Table 3. All scores could predict poor outcome at 6 months after OHCA with statistical significance (all P < 0.001). PROLOGUE showed the highest AUC (0.923; 95% confidence interval [CI], 0.904-0.941), whereas the SR-QOLl score had the lowest AUC (0.749; 95% CI, 0.711-0.786). All scores showed similar AUC in the analyses after multiple imputation (Table 4). Table 5 shows sensitivity, specificity, positive and negative predictive values, and accuracy for different cut-offs. The results of pairwise comparisons of the ROC curves are summarized in Table 6. The calibration performances of the prognostication scores in the 804 patients are shown in Fig 4. Calibration belts for the PROLOGUE, TTM risk, CAHP, NULL-PLEASE, 5-R, and cardiac arrest survival scores contained bisecting lines (representing perfect calibration) across the entire range of predictions. For the rCAST and PHR risk scores, the 80% CI boundaries of the calibration belt did not contain bisecting lines at extreme predicted probability values, although such lines were present in the 95% CI boundaries of calibration belts across the entire range of predictions (P = 0.060 and 0.114, respectively). Calibration belts for the prediction score by Aschauer et al. (P = 0.005), OHCA score (P = 0.026), C-GRApH score (P = 0.013), and SR-QOLl score (P < 0.001) significantly deviated from the bisecting line. This was also true in the analyses following inclusion of imputed data (prediction score by  Aschauer et al., P < 0.001; OHCA score, P = 0.007; C-GRApH score, P = 0.018; and SR-QOLl score, P < 0.001).

Discussion
We evaluated the performances of 12 existing prediction scores developed for early prognosis estimation after OHCA in predicting poor outcome at 6 months after cardiac arrest using independent data from a multicenter registry of comatose OHCA patients who underwent TTM. In this study, the discrimination performances of the scores were all acceptable, some even being excellent. However, some scores (prediction score by Aschauer et al., OHCA score, C-GRApH score, and SR-QOLl score) showed significant miscalibration. To the best of our knowledge, this is the largest study to evaluate the performances of multiple cardiac arrest-specific prognostication scores in an East Asian population.
Our study population differed in many aspects from the original study populations used to develop the scores included in this study. Most of the scores were derived from studies conducted in European countries or Unites States [5, 6, 8-10, 12-14, 16], where the prehospital and in-hospital care processes are quite different from Korean practice. In the patient populations used to derive the C-GRApH, TTM risk, 5-R, and PHR risk scores [8,9,11,16], the proportion with initial shockable rhythm was over 85%; in contrast, this proportion was only 36.8% in our study. The proportion of witnessed arrest was 71.6% in our study population, whereas it was higher than 85% in the study populations for the CAHP score, C-GRApH score, TTM risk score, 5-R score, and prediction score by Aschauer et al. [6,[8][9][10][11]. In contrast to our study population, only 11% and 51.7% of patients were treated with TTM in the studies generating the OHCA and PROLOGUE scores, respectively [5,7]. In addition, the primary  outcome of our study was poor outcome at 6 months after OHCA, whereas most of the scores were developed for prediction of outcomes at hospital discharge [5-8, 11, 12, 14, 16]. Despite these differences, the PROLOGUE, TTM risk, CAHP, NULL-PLEASE, 5-R, and cardiac arrest survival scores demonstrated satisfactory discrimination and calibration performances for predicting poor outcome at 6 months after OHCA. Although the calibration performance was not perfect, the rCAST and PHR risk scores also showed acceptable overall calibration and decent discrimination performances. These results not only support the robustness and generalizability of these scores, but also extend their applicability to the prediction of long-term outcomes.  Prognostication scores commonly estimate outcomes using combination of predictor variables selected through logistic regression. However, the studied scores vary greatly in terms of complexity. The prediction score by Aschauer et al. is composed of only four variables, whereas PROLOGUE is composed of 12 variables. Some scores are simply calculated as the sum of points awarded for each of the variables that are present [8][9][10][11][12][13][14], whereas others are calculated using complex formulas or nomograms [5-7, 15, 16]. Among those in this study, the       PROLOGUE, TTM risk, and CAHP scores showed outstanding predictive performance (median AUC values > 0.9), but these scores require elaborate calculations, as they use a relatively complex nomogram or multi-point scoring system with a different weight for each parameter. Although these scores are relatively complex, this would not hinder practicality for clinical use if they could be calculated electronically using a desktop calculator or mobile device.
In this study, the prediction score by Aschauer et al., OHCA score, C-GRApH score, and SR-QOLl score showed acceptable discrimination but significant miscalibration. The prediction score by Aschauer et al. and C-GRApH score overestimated the actual risk of poor outcome at extreme predicted probability values, whereas the OHCA score and SR-QOLl score underestimated it. Although the calibration performances of the prediction score by Aschauer et al., C-GRApH score, and SR-QOLl score, to the best of our knowledge, have not been evaluated in separate studies, the low calibration capacity of the OHCA score for predicting poor outcome (CPC 3-5) at 6 months after OHCA has also been reported by other researchers [9,19]. Our study suggests that these scores need to be updated for use in settings similar to ours.
These scores would allow treating physicians to provide a patient's likely long-term outcome in a more objective manner in the early hours after OHCA. Although the prognostication scores in the present study could predict poor outcome with statistical significance, they were not specific enough to be used for important therapeutic decision-making (e.g., withholding or withdrawing life-saving treatment). These scores can be used as an adjunct to guide therapeutic decision-making. However, given the insufficient specificities observed in this study, important therapeutic decisions should not be made based on these prognostication scores alone.
Our study has several limitations. First, it was a retrospective analysis of data collected from teaching hospitals in the Republic of Korea. The performances of prognostication scores may be different in other healthcare or country settings. Second, we evaluated the performances of prognostication scores, but could not assess their clinical usefulness. Further studies are required to evaluate this. Third, we could not evaluate several cardiac arrest-specific prognostication scores that required variables unavailable from our registry data [19,[30][31][32]. Lastly, the treating physicians were not blinded to the constituent results of the prognostication scores, thereby introducing the potential for self-fulfilling prophecy bias.

Conclusions
We evaluated the performances of 12 existing cardiac arrest-specific prognostication scores in predicting poor outcome at 6 months after OHCA using data from a multicenter registry of comatose OHCA patients who underwent TTM. The PROLOGUE, TTM risk, CAHP, NULL-PLEASE, 5-R, and cardiac arrest survival scores showed satisfactory discrimination and calibration performances. Although the calibration performance was not perfect, the rCAST and PHR risk scores also showed acceptable overall calibration and good discrimination performances. The prediction score by Aschauer et al., OHCA score, C-GRApH score, and SR-QOLl score showed acceptable discrimination but significant miscalibration. None of the prognostication scores in this study were specific enough to be used alone in important therapeutic decision-making. These study findings may improve our understanding of these prognostication scores and thereby aid in the interpretations of the prediction results.